HMM-based Indic handwritten word recognition using zone segmentation
نویسندگان
چکیده
This paper presents a novel approach towards Indic handwritten word recognition using zone-wise information. Because of complex nature due to compound characters, modifiers, overlapping and touching, etc., character segmentation and recognition is a tedious job in Indic scripts (e.g. Devanagari, Bangla, Gurumukhi, and other similar scripts). To avoid character segmentation in such scripts, HMMbased sequence modeling has been used earlier in holistic way. This paper proposes an efficient word recognition framework by segmenting the handwritten word images horizontally into three zones (upper, middle and lower) and recognize the corresponding zones. The main aim of this zone segmentation approach is to reduce the number of distinct component classes compared to the total number of classes in Indic scripts. As a result, use of this zone segmentation approach enhances the recognition performance of the system. The components in middle zone where characters are mostly touching are recognized using HMM. After the recognition of middle zone, HMM based Viterbi forced alignment is applied to mark the left and right boundaries of the characters. Next, the residue components, if any, in upper and lower zones in their respective boundary are combined to achieve the final word level recognition. Water reservoir feature has been integrated in this framework to improve the zone segmentation and character alignment defects while segmentation. A novel sliding window-based feature, called Pyramid Histogram of Oriented Gradient (PHOG) is proposed for middle zone recognition. PHOG features has been compared with other existing features and found robust in Indic script recognition. An exhaustive experiment is performed on two Indic scripts namely, Bangla and Devanagari for the performance evaluation. From the experiment, it has been noted that proposed zone-wise recognition improves accuracy with respect to the traditional way of Indic word recognition.
منابع مشابه
Zone-based Keyword Spotting in Bangla and Devanagari Documents
In this paper we present a word spotting system in text lines for offline Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown that zone-wise recognition method improves the word recognition performance than conventional full word recognition system in Indic scripts [29]. Inspired with this idea we consider the zone segmentation approach and use middle zone information ...
متن کاملHolistic Farsi handwritten word recognition using gradient features
In this paper we address the issue of recognizing Farsi handwritten words. Two types of gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidde...
متن کاملCross-language Framework for Word Recognition and Spotting of Indic Scripts
Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available scr...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملUnconstrained Arabic Online Handwritten Words Segmentation using New HMM State Design
In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An essential problem addressed by analytical-based word recognition system. The system is composed of two-stages the first is a newly special designed hidden Markov model (HMM) and the second is a rules based stage. In our system, handwritten words are broken up into characters by simultaneous segmentati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 60 شماره
صفحات -
تاریخ انتشار 2016